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PRELIMINARY AMENDMENT 
Prior to examination, please amend the appUcation as follows: 

In the Claims : 

Please CANCEL claims 1-25 without prejudice or disclaimer thereof. 

Please ADD the following new claims: 

—26, An audio conferencing method comprising: 

receiving audio data from a source audio client; 

attenuating the received audio data based on audio decay characteristics to simulate a 
distance between the source audio chent and a target audio client; and 
dehvering the attenuated audio data to the target audio client. 

27. The method of claim 26, wherein the target audio client is the same as the source 
audio client. 

28. The method of claim 26, wherein the target audio chent is different than the 
source audio client. 

29. The method of claim 28, further comprising delivering the attenuated data to the 
source audio client. 

30. The method of claim 26, wherein the source and target audio chents are displayed 
as points on a viewing screen from which sound appears to emanate. 

31. The method of claim 30, wherein the source audio client comprises a point source 
audio (PSA) client that originates from stored audio data. 



32. The method of claim 3 1 , wherein the PSA includes point sources of sound from a 
file or user input. 

33. The method of claim 30, wherein the source audio chent comprises a set-top box 
(STB) audio client the originates from an audio conferencing user. 

34. The method of claim 33, wherein the STB includes a set-top apphcation for 
controlling audio data from a microphone or speaker. 

3 5 . The method of claim 30, wherein the target audio cUent comprises a set-top box 
(STB) audio client that originates from an audio conferencing user. 

36. The method of claim 35, wherein the STB includes a set-top apphcation for 
controlling audio data from a microphone or speaker. 

37. The method of claun 26, wherein a plurality of audio chents participate in an 
audio conference. 

38. The method of claim 26, further comprising managing one or more audio 
conferences using an Interface Definition Language (IDL) that creates and deletes conferences, 
adds and removes participants to and from the conferences, and changes a volume balance 
among participants in the conferences, 

39. The method of claim 26, wherein attenuating comprises identifying a decay factor 
for each audio client. 

40. The method of claim 39, wherein the decay factor is a customized decay factor. 

41 . The method of claim 39, wherein attenuating further comprises determining a 
weighted value between the source audio cUent and the target audio chent based on the source 
audio client's decay factor. 

42. The method of claim 41, wherein attenuating further comprises calculating a mix for 
the audio chents using the weighted values. 

43. The method of claim 42, wherein attenuating further comprises refining the mix 
for the audio chents by adjusting a plurality of audio data functions such as gain control, fade 



in/fade out, floating point operation elimination, mixing adaption, mixing cut-off, and stream 
audio. 



44. Computer software, stored on a computer-readable medium, for an audio 
conference server (ACS), the software comprising instructions for causing a computer processor 
to perform the following operations: 

receive audio data fi*om a source audio cUent; 

attenuate the received audio data based on audio decay characteristics to simulate a 
distance between the source audio cHent and a target audio chent; and 

deliver the attenuated audio data to the target audio chent. — 

REMARKS 

AppUcant submits that all of the claims are now in condition for examination, which 
action is requested. Please apply any charges or credits to Deposit Account No. 06-1050. 

RespectfiiUy submitted, 
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Spatialized Audio in a Three-Dimensional, 
Computer-Based Scene 



Inventor: Shinya Matsuoka 

Background of the Invention 

Field of the Invention 

The present invention relates generally to audio conferencing, and more 
particularly to spatial audio in a computer-based scene. 

Related Art 

An audio conference consists of an environment shared by viewers using 
the same application over an interactive TV network. In a typical application, 
viewers move graphic representations (sometimes referred to as personas or 
avatars) on the screen interactively using a remote control or game pad. 
Viewers use their set-top microphones and TV speakers to talk and listen to 
other viewers and to hear sounds that are intended to appear to come from 
specific locations on the screen. 

Conferencing software that supports real-time voice commimications 
over a network is becoming very common in today's society. A distinguishmg 
feature between different conferencing software programs is an ability to 
support spatialized audio, i.e., the ability to hear sounds relative to the location 
of the listener ~ the same way one does in the real world. Many non- 
spatialized audio conferencing software products, such as NetMeeting, 
manufactured by Microsoft Corp., Redmond, WA. and Intel Corp., North 
Bend, WA.; CoolTalk, manufactured by Netscape Communications Corp., 
Mountain View, CA,; and TeleVox, manufactured by Voxware Inc., Princeton, 
NJ., are rigid. They do not provide distance-based attenuation (i.e., sounds 
are not heard relative to the distance between persona locations on the TV 
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screen during the conference). Non-spatialized audio conferencing software 
does not address certain issues necessary for performing communications in 
computer scenery. Such issues include: (1) efficient means for joming and 
leaving a conference; and (2) provision for distance attenuation and other 
mechanisms to provide the illusion of sounds in real space. 

Spatialized audio conference software does exist. An example is 
Traveler, manufactured by OnLive! Technologies, Cupertino, CA., but such 
software packages exist mainly to navigate 3D space. Although they attempt 
to spatialize the audio with reference to human representatives in the scene, a 
sound's real world behavior is not achieved. 

As users navigate through a computer-based scene such as a Virtual 
Reality Modeling Language (VRML) ^'world*', they should be able to hear 
(and to broadcast to other users) audio sounds emanating from soxirces withm 
the scene. Current systems typically do not do a very good job of realistically 
modelmg sounds. As a result, the sounds not are heard relative to the user's 
current location as in the real world. 

What is needed is a system and method for providing audio 
conferencing that provides realistic sounds that appear to emanate from 
positions in the scene relative to the location of the user's avatar on the TV 
screen. 

Summary of the Invention 

Briefly stated, the present invention is directed to a system and method 
for enabling an audio conference server (ACS) to provide an application 
program with multi-point, weight controllable audio conferencing fimctions. 
The present invention achieves realistic sound by providmg distance-based 
attenuation. The present invention associates an energy level with the sound 
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(e.g., weak, medium, strong, etc.) to define the sound within the scene 
accordmg to the sound^s real world behavior. 

The ACS manages a plurality of audio conferences, receives audio data 
from a plurality of audio clients, mixes the audio data to provide distance-based 
attenuation and decay characteristics of sounds, and delivers the mixed audio 
data to a plurality of audio clients. Audio clients include set-top box audio 
clients and point source audio (PSA) audio clients. Set-top box audio clients 
can be set-top boxes, computer workstations, and/or personal computers (PCs). 
PSA audio clients include audio files and audio input lines. 

The ACS mixes the audio data by identifying a decay factor. Pre- 
defined decay factors include an audio big decay factor, an audio small decay 
factor, an audio medium decay factor, and a constant decay factor. One can 
also develop a customized decay factor. A weighted value for a source audio 
client based on the identified decay factor and the distance between the source 
audio client and a target audio client is determined. A mix table is generated 
using the weighted values for each source/target audio client pair. Then, an 
actual mix value for each target audio client is calculated using the weighted 
values from the mix table. The present invention also includes means for 
refining the actual mix value. 

The ACS manages the audio conferences using an ACS shell. The ACS 
shell is a user interface that provides interactive program access to the ACS 
usmg high level methods for creatmg and managing a proxy audio conference 
and for creating and managing point source audios. The ACS shell also 
provides program access to the ACS via low level methods for creating and 
managing audio conferences. 

The ACS also checks the status of a registered owner of each audio 
conference using a resource audit service (RAS). The RAS informs the ACS 
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when the registered conference owner stops running. Then, the ACS closes the 
conference. 

Further features and advantages of the invention, as well as the structure 
and operation of various embodiments of the invention, are described in detail 
5 below with reference to the accompanying drawings. In the drawings, like 

reference numbers generally indicate identical, functionally similar, and/or 
structurally similar elements. The drawhsg in which an element first appears 
is indicated by the digit(s) to the left of the two rightmost digits in the 
corresponding reference number. 

Brief Description of the Figures 

The present invention will be described with reference to the 
accompanying drawings, wherein: 

FIG. 1 is a block diagram of an audio conferencing network 
environment according to a preferred embodiment of the present invention; 

FIG. 2 is a block diagram of a computer system useful for implementing 
the present invention; 

FIG. 3 is a diagram representing the threads involved in audio 
conferencing for the present invention; 

FIG. 4 is an exemplar process of an audio conference service's 
conference owner thread; 

FIGS. 5A and 5B represent a flow diagram of the process of changing 
a registered owner of an audio conference; 

FIG. 6 represents a diagram representing the play back of a PSA; 
FIG. 7 represents a graph showing pre-defined decay factors for four 
categories of sounds; 

FIG. 8 is an exemplary mix table with audio mix equations for target 
audio clients; 
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FIG. 9A is a flow diagram representing the functionality of the mixer 

thread; 

FIG. 9B is a flow diagram representing the audio mixing process; 
FIG. 10 is a diagram representing program access and internal interfaces 
5 to the audio conference classes of the ACS shell; 

FIG. 11 is a list of the methods contained in an ACFroxy class; 
FIG. 12 is a list of the methods contained m a PointSourceAudio class; 
FIG. 13 is a list of the methods contained in an AudioConferenceService 

class; 

10 FIG. 14 is a list of the methods contained in an AudioConference class; 

FIG. 15 is a flow diagram representing the addition of an audio client 
to a proxy audio conference; and 

FIG. 16 is an exemplary flow diagram representing an audio conference 
in a service application tising the lower level methods of the AudioConference 
15 and AudioConferenceService classes. 

Detailed Description of the Preferred Embodiments 

The preferred embodiment of the present invention is discussed in detail 
below. While specific configurations are discussed, it should be understood 
20 that this is done for illustration purposes only. A person skilled m the relevant 

art will recognize that other components and configurations may be used 
without departing from the spirit and scope of the mvention. 

Overview of the Invention 

The present invention is directed to a system and method for enabling an 
25 audio conference server (ACS) to provide multi-point, weight controllable audio 

conferencing functions to application programs. The present invention allows 
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application programs to incorporate audio conference features without being 
concerned with the details of audio processing, audio delivery, and audio data 
mixing. The present invention allows multiple ^plications to share one 
conference. Applications can also use a conference created by another 
5 apphcation. The present invention allows viewers in an audio conference to hear 

sounds and talk to other viewers across an interactive TV network. 

In an application service that incorporate the ACS, the ACS enables the 
application service to have audio clients. Audio clients are displayed as points 
on a TV screen from which sound appears to emanate. Approaching a source of 
10 sound makes the sound grow louder. Moving away from the source of sound 

;i makes the sound grow fainter. Audio clients can be point sources of sound, 

;P referred to as point source audios (PSAs), from audio files or an audio input line. 

Q Viewers having conversations with others on a network using a set-top box (STB) 

'1; microphone and TV speaker are another type of audio client, often referred to as 

W 15 a set-top box (STB) audio client. The STB audio client mckides a set-top 

application for controlling an audio stream of data emanating from the STB 
j"^ microphone or the TV speaker. 

:S The set-top application, application service, and the ACS typically reside 

S on separate systems, thus requiring a network communications' set-up among 

20 them. The ACS manages the data exchange and audio mixing among clients and 

servers. The ACS uses a COBRA (Common Object Request Broker 
Architecture) Interface Defmition Language (IDL) interface for settmg up an 
audio conference, and a User Datagram Protocol/Internet Protocol (UDP/IP) 
interface for transmitting and receiving audio streams of data. Both COBRA IDL 
25 and UDP/IP interfeces are well known to persons skilled in the relevant art(s). 

The ACS receives audio streams of data from STB and PSA audio clients, 
mixes the audio streams of data to enable distance-based attenuation of sounds 
as well as decay characteristics according to the sound's behavior, and delivers 
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the mixed audio streams to the designated STB audio clients. Audio mixing 
adjusts the volume of a persona's voice (from a STB) or sound emanating from 
a PSA relative to the distance between the location of the sound and the audio 
client's persona on the TV screen: the greater the distance, the fainter the audio. 
5 Also, if the sound is a low-energy sound, such as a wind chime, the sound will 

decay qxiickly. If flie sound is a high energy sound, such as a waterfall, the sound 
will decay slowly. 

A user can interactively interface with the ACS using an ACS shell. The 
ACS shell, written in TCL (a tool command language developed by John 

10 Ousterhout of the University of California, Berkeley), is an E)L-based client that 

connects to the ACS. The ACS shell enables developers to prototype 
applications, and operators to monitor and control audio conferences. 

Figure 1 is a block diagram of an exemplary audio conferencing network 
enviroimient 100 in which the present invention is implemented. The audio 

15 conferencing network 100 includes a headend server 102 and a plurality of 

workstations 120. The workstations 120 are coimected to the headend server 102 
via a communication bus (not explicitly shown). A typical conmiunication bus 
architecture includes any one of coaxial, fiber-optic, and lOBaseT cabling, all of 
which are well known to persons skilled in the relevant art(s). 

20 The headend server 102 houses an audio conference server (ACS) 104, an 

application service 106, and an ACS shell 110. The headend server 102 may also 
contain PSAs 108. The application service 106 is an application that 
incorporates the ACS 104. PSAs 108 can be audio files or analog input lines. 
The ACS 104 enables the application service 106 to incorporate audio 

25 conference features without being concerned with the details of audio processing, 

audio delivery, and audio data mixing. The ACS shell 110 is an IDL client that 
connects to the ACS 104 to provide a user interactive interface to monitor and 
control the fiill range of ACS ftmctions. 
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Each workstation 120 contains a set-top box (STB) 112 and a TV 122. 
The TV 122 includes a TV screen 125 and a speaker 126. The TV screen 125 
displays the audio clients (PSAs 108 and/or STBs 112) that are included in the 
audio conference while the sounds emanating from the displayed audio clients 
(PSAs 108 and/or STBs 112) are heard by a viewer 124 via the TV speaker 126. 
The STB 112 contains a set-top application 114, an ACS client library 116, and 
a microphone 118. The ACS client library 116 contains application program 
interfaces (APIs) that are exported to the set-top application 114. The APIs 
enable the set-top application 114 to control an audio stream of data emanating 
from the microphone 118 or the TV speaker 126. An ACAudioClient class 
contains the API interface between the set-top application 114 and the ACS 
client library 116. The ACAudioClient class contains methods that enable set- 
top applications 114 to join or leave an audio conference, start and stop audio 
conferencing ability for STB 112 audio clients, and to control and monitor 
audio talking. 

The ACS 104 enables the application server 106 to have audio clients. 
The functionality of the ACS 104 is two-fold. First, the ACS 104 manages the 
audio stream of data. For example, audio sounds emanating from each STB 
112 are interfaced to the ACS 104 over the communication bus usmg a UDP/IP 
interface 128. When a viewer 124 speaks into the microphone 118, the 
viewer's voice is imported from the microphone 118 to the set-top box 112 m 
a digitized pulse code modulation (PCM) format. PCM is a digital transmission 
technique by which a continuous signal is periodically sanq)led, quantized, and 
transmitted in digital binary code. PCM is well known to persons skilled in the 
relevant art(s). (Note that when a viewer is not speaking, silence suppression 
is provided to avoid flooding the network.) The ACS client library 116 sends 
the PCM audio data stream as UDP/IP data packets 128 over the 
communication bus to the ACS 104 m the headend server 102. UDP/IP 
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protocol is used for real-time audio data delivery. The ACS 104 mixes the 
audio data stream accordingly, and sends the mixed audio data stream to each 
designated viewer's set-top TV speaker 126 via UDP/IP over the 
communication bus through the appropriate STBs 112. Each designated viewer 
5 hears the sound relative to that viewer's position to the sound on the TV screen 

125 according to the real life characteristics of the sound. If the set-top 
application 114 has turned on the local echo feature, the viewer's voice is also 
sent locally to the viewer's set-top speaker 126. The echo is m real time; there 
is no network delay, 

10 Second, the ACS 104 manages the audio conference. Conference 

management is performed using IDL. The application service 106 
J: communicates with the ACS 104 over the communication bus using an IDL 

\| mterface 130. The set-top application 114 communicates with the application 

J service 106 over the communication bus using an IDL interface 131. 

15 Communications from the set-top application 114 that are relevant to audio 

':'t 

Q conference management are translated by the application service 106 into a 

conference management command and passed to the ACS 104 via BDL interface 

1^ 130. The ACS 104 handles IDL requests from the application service 106. 

□ The IDL interface 130 comprises software methods that when executed, 

20 perform various ACS functions. These methods, when called by the 

application service 106 are executed transparently. The methods perform such 
functions as creating and deleting a conference, adding and removing 
participants to and from the conference, respectively, changing the volume mix 
balance between participants in the conference, etc. Although IDL is relatively 
25 slow, IDL was chosen for its reliability. 

Implementation of the Invention 
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The headend server 102 is preferably implemented using a computer 
system, such as exemplary computer system 200 shown in Figure 2. 
Alternatively, the headend server 102 comprises a plurality of computer systems, 
each like computer system 200. In an alternate embodiment, the application 
service 106 and the PSA 108 are implemented using a smgle computer system, 
and the ACS 104 is a separate computer system. In another embodiment, the 
application service 106 is implemented usmg a single computer system, and the 
ACS 104 and the PSA 108 are implemented on a separate computer system. 
Other distributions of the application service 106, the ACS 104, and the PSA 108 
among computer systems are within the scope and spirit of the present invention. 

The computer system 200 includes one or more processors, such as 
processor 202. The processor 202 is connected to a communication bus 204. The 
computer system 200 also includes a main memory 206, preferably random 
access memory (RAM), and a secondary memory 208. The secondary memory 
208 includes, for example, a hard disk drive 210 and/or a removable storage drive 
212, representing a floppy disk drive, a magnetic t^ drive, a compact disk drive, 
etc. The removable storage drive 212 reads from and/or writes to a removable 
storage unit 214 in a well known manner. 

Removable storage unit 214, also called a program storage device or a 
computer program product, represents a floppy disk, magnetic tape, compact disk, 
etc. The removable storage unit 214 includes a computer usable storage medium 
having stored therein computer software and/or data, such as an object's methods 
and data. 

The computer system 200 can communicate with other computer systems 
via network interface 215, The network interface 215 is a network interface 
circuit card that cormects computer system 200 to other computer systems via 
network 216. The other computer systems can be computer systems such as 
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computer system 200, set-top box audio clients 112, or PCs and/or workstations. 
The network 216 can be one of fiber optics, coaxial, or lOBaseT. 

Computer programs (also called computer control logic), including object- 
oriented computer programs, are stored in main memory 206 and/or the 
secondary memory 208. Such computer programs, when executed, enable the 
computer system 200 to perform the features of the present invention as discussed 
herein. In particular, the computer programs, when executed, enable the 
processor 202 to perform the features of the present invention. Accordingly, such 
computer programs represent controllers of the computer system 200. 

In another embodiment, the mvention is directed to a computer program 
product comprising a computer readable mediiun having control logic (computer 
software) stored thereia The control logic, when executed by the processor 202, 
causes the processor 202 to perform the functions of the invention as described 
herein. 

In yet another embodiment, the invention is implemented primarily in 
hardware using, for example, one or more state machines. Implementation of 
these state machines to perform the functions described herein will be apparent 
to persons skilled in the relevant art(s). 

ACS Threads 

The ACS 104 is a multi-threaded process having multiple threads, each 
thread executing a separate audio conference function. The ACS 104 contains 
a thread for managing audio conferences, a thread for monitoring the system, 
a thread for keeping track of conference ownership, a thread for receiving 
audio data from audio clients, and a thread for mixing and delivering audio 
data. 

Figure 3 is a diagram 300 representmg the multiple threads of the ACS 
104. Upon initialization, the ACS 104 creates three threads. The first thread 
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is a conference manager thread 302 that handles incoming IDL method calls 
from the application service 106. The second thread is a conference recycler 
thread 304 that monitors the system and performs appropriate garbage 
collection. The third tiiread is a resomx:e audit service (RAS) pinger thread 306 
5 that checks the status of the registered owner of a conference. 

Users mteractively interface with the ACS 104 via the ACS shell. The 
ACS shell enables program access to low level IDL methods. The conference 
manager thread 302 handles all incoming IDL method calls from the application 
service 106. The low level IDL methods that are handled by the conference 
10 manager thread 302 are discussed below (under ACS Shell). 

'5 The ACS 104 needs to know if the application that created an audio 

J: conference is still running. When the audio conference is generated and used 

y by the same application, the application reports that it is alive by pinging the 

™: ACS 104. The conference recycler thread 304 monitors the pinging. When the 

15 pinging stops, the conference recycler thread 304 terminates the audio 

□ conference. 

When the audio conference is generated by one plication and used by 
B a different application, the above method of pinging the conference recycler 

m thread 304 is not applicable. For example, a game launcher application 

20 generates a conference that is used by a game. To acconmiodate this situation, 

the RAS pmger thread 306 is used. The RAS pinger thread 306 is responsible 
for pinging a resource audit service (RAS) to obtain information about the 
existence of the registered owner of a conference. Conference ownerships are 
registered with the RAS via a registered owner interface. Using the RAS 
25 pinger thread 306, the RAS informs the ACS 104 when the registered 

conference owner stops runnmg. The incorporation of the RAS pinger thread 
306 enables an application claiming ownership of an audio conference to pass 
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audio conference ownership to another application using the registered owner 
interface. 

An audio conference ownership transfer will now be discussed with 
reference to Figures 4, 5A and 5B. Figure 4 is an exemplar process 400 
representing a conference ownership transfer using the RAS pinger thread 306. 
Two applications are shown in Figure 4. The first application is a box office 
application 402. The second application is a theatre application 404. The box 
office application 402 sells tickets to viewers. Since no one can enter the 
theatre without purchasing a ticket from the box office, the box office 
application 402 creates the audio conference and registers itself as the owner 
of the conference with the ACS 104. Once the tickets have all been sold and 
the attraction is ready to begin, the viewers enter the theatre room. At this time 
the box office application 402 passes audio conference ownership to the theatre 
application 404 and registers the theatre application 404 as the owner of the 
conference with the ACS 104. 

The ACS 104 keeps a lookup table 406 in which each audio conference 
has a matching entry 408. The matching entry 408 is the registered owner of 
the conference. When the theatre application 404 terminates, the ACS 104 
detects the termination and closes the conference. 

The RAS 410 keeps track of a registered application. The RAS 410 
pings the registered application using IDL and also checks for its existence via 
the operating system. When the registered application ceases to exist, the RAS 
410 notifies the ACS 104. If the application was the registered owner of the 
conference, then ACS 104 closes the conference and cleans up. In the above 
example, when the viewers leave the theatre and the theatre application 404 
ceases to exist, the RAS 410 reports it to the ACS 104. The ACS 104 looks at 
the table, sees that the registered owner is gone, and closes the conference. 
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Figures 5A and 5B represent a flow diagram 500 of the process of 
changing a registered owner of a conference. With reference to Figure 5A, m 
step 502 a first application starts the conference, and control passes to step 504. 
In step 504, the owner of the conference is registered with the ACS 104 (shown 
as arrow 412 in Figure 4). The ACS 104 enters the owner of the conference 
in the lookup table 406 in step 506a (shown as arrow 414 m Figure 4). The 
ACS 104 then informs the RAS 410 that it is interested in knowing when the 
first application ceases to exist in step 506b (shown as arrow 416 in Figure 4). 
Control then passes to decision step 507, 

In decision step 507, it is determined whether die first application wants 
to move the conference. If the first application does not want to move the 
conference, control stays with decision step 507. If the first application wants 
to move the conference, control passes to step 508. 

In step 508, the first application moves the conference ownership to a 
second application (shown as arrow 418 in Figure 4). Conference ownership 
in the second application is then passed to the ACS 104 in step 510 (shown as 
arrow 420 in Figure 4). 

Referring to Figure 5B, in step 512, the second application is registered 
as the owner of the conference, and the ACS 104 rewrites the lookup table 406 
to reflect the new owner of the conference in step 514a (shown as arrow 422 
in Figure 4). The ACS 104 then informs the RAS 410 that it is interested in 
knowing when the second application ceases to exist in step 514b (shown as 
arrow 424 in Figure 4). Control then passes to decision step 515. 

In decision step 515, it is determined whether the second appUcation has 
ceased. If the second application is still running, control remains in step 515. 
When the second application ceases, control passes to step 516. 
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In step 516, the RAS 410 reports the loss of the second application to 
the ACS 104 (shown as arrow 426 in Figure 4). Control then passes to 
decision step 517. 

In decision step 517, it is determined whether the second application is 
registered as the owner of the conference in the lookup table. If the second 
application is not registered as the owner of the conference in the lookup table, 
control remains in step 517. If the second application is registered as the owner 
of the conference in the lookup table, control passes to step 518. In step 518, 
the ACS 104 closes the conference. 

Returning to Figure 3, when a new conference starts, two new threads 
are generated in the ACS 104. The first new thread is a net listener thread 308. 
The net listener thread 308 receives audio data from audio clients 112 via the 
network. The second new thread is a mixer thread 310. The mixer thread 310 
performs actual audio mixing and delivers the mixed audio data to the audio 
clients 112 through the network. Actual audio mixing will be discussed below. 

The net listener thread 308 receives upstream audio data packets that are 
sent from the STB 112. The net listener thread 308 listens to the network. 
Whenever STBs 112 send data over the network via UDF/IP 128, the net 
listener thread 308 gathers the audio data packets and passes them to the mixer 
thread 310 via a shared buffer (not shown) located between the net listener 
thread 308 and the mixer thread 310. 

A PSA can be played from the application service 106 or from within 
the ACS 104. Figure 6 is a diagram 600 representing the play back of a PSA 
108. When PSAs 108 are played from the application service 106, one thread 
604, called thAudioSource, per PSA is generated in the appUcation service 106. 
The thAudioSource thread 604 gets audio from the file (i.e., PSA 108) and 
sends it to the ACS 104 via UDP/IP. After the application service 106 sends 
the PSA 108 to the ACS 104, the process is the same as described for a set-top 
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audio client 112 (i.e., the data is received by the net listener thread 308 and 
passed to the mixer thread 310 where it is mixed and sent to the designated 
audio clients 112 over the network via UDP/EP). 

As previously stated, an application can play a PSA 108 in the ACS 
104, When an application plays the PSA 108 m the ACS 104, the net listener 
thread 308 dkectly fetches the audio data from the ffle (i.e. , PSA 108). This 
reduces the network traffic of the audio stream. 

Audio Mixing and Delivery 

As previously stated, the mixer thread 310 performs actual audio mixing 
and delivers the mixed audio data to the audio clients 112 through the network 
via UDP/EP 128. The present invention provides audio mixing with distance- 
based attenuation. When an audio client (PSAs 108 or STBs 112) moves closer 
to another audio client (PSAs 108 or STBs 112), the sound gets louder, and 
when the audio client (PSAs 108 or STB 112) retreats, the sound gets quieter. 
A PSA 108 might be, for example, the sound of a snoring dragon. When an 
audio client represented by a STB 112 moves closer to the dragon, the snoring 
sounds louder, and when the STB 112 retreats, the snoring sound is quieter. 

The ACS 104 accon^)lishes this by implementing decay characteristics 
for categories of sounds. Figure 7 represents a graph 700 showing pre-defined 
decay factors for four categories of sounds. Graph 700 plots volume vs. 
distance for each pre-defined decay factor. 

The &st pre-defined decay factor represents a sound of constant volume 
regardless of distance. This plot is identified as audioConstant 702. 
AudioConstant sounds are heard at the same volume from anywhere on the TV 
screen 126. The second pre-defined decay factor represents a sound of loud 
volume. This plot is identified as audioBig 704. AudioBig sounds can be 
heard by an audio client (PSA 108 or STB 112) anywhere on the TV screen 126 
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(shown on TV screen 706), and even several screens away. The third pre- 
defined decay factor represents a sound of low volume. This plot is identified 
as audioSmall 712. AudioSmall sounds can only be heard by audio clients 
(PSAs 108 and STBs 112) who are near the sound source on the TV screen 126 
(shown on TV screen 714). The small sound (audioSmall 712) decays to zero 
inversely and more quickly than the big sound (audioBig 704). The last pre- 
defined decay factor represents a medium sound, i.e., a sound that falls 
between audioBig 704 and audioSmall 712. This plot is identified as 
audioMedium 708. AudioMedium sounds can be heard on approximately half 
of the TV screen 126, but not beyond the TV screen 126 (shown on TV screen 
710). Medium sounds decay linearly, in between the small and big sounds. 
Developers can also customize decay factor values. A plot of an exen^lary 
custom decay factor 716 is also shown in graph 700. 

When audio clients (PSAs 108 and STBs 112) are added to the 
conference, the application specifies the decay factor for that audio client (PSA 
108 or STB 112). As previously stated, audio data received from audio clients 
(STBs 112) is received via the net Ustener thread 308. The mixer thread 310 
performs the actual audio mixing and delivers the mixed audio data to the audio 
clients (STBs 112) through the network via UDP/IP 128. The actual audio 
mixing is accon5)lished by generating an audio mix table. An exemplary audio 
mix table 800 is shown in Figure 8. The mix table 800 contains weighted 
values for each source audio client (STB 112) in relationship to each target 
audio client (PSA 108 or STB 112) in the conference. A target audio client 
(STB 112) is the audio client that is receiving the sound. According to graph 
700, the target audio client (STB 112) always resides at location (0,0). A 
source audio client (PSA 108 or STB 112) is the audio client fi-om which the 
sound is emanatmg. Weighted values for each source audio client (STB 112) 
are extracted from graph 700 according to the distance between the target audio 
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client (STB 112) and the source audio client (PSA 108 or STB 112) using the 
decay factor specified for the source audio client (PSA 108 or STB 112) when 
that source audio client (PSA 108 or set-top box 112) was added to the 
conference. The weighted values range from 0.0 to 1 .0, with 0,0 indicating no 
volxune and 1,0 indicating maximum volume. 

The mix table 800 shows that the weight of a target audio client (STB 
112) to itself is 0.0. Thus the target audio cUent (STB 112) will not be able to 
hear its own echo. 

The audio mix to be delivered to target audio client audiol 802 is 0.0 
for audio client audiol 802, 1.0 for audio client audio2 804, and 0.7 for audio 
client audio3 806. This indicates that audio client audiol 802 will hear audio 
client audio2 804 at maxunum volume and audio client audio3 806 at 70% of 
the maximimi volume. Equation 808 represents the audio mix for target audio 
client audiol 802. Equations 810 and 812 represent the audio mix for target 
audio clients audio2 804 and audio3 806, respectively. 

The mixer thread 310 also refines the mixed audio using the following 
functions: gain control, fading in/fading out, floating point operation 
elimination, mixing adaption, mixing cut-off and stream audio. The gain 
control function controls the gain to avoid transmitting excess energy audio data 
after calculating the weighted sum. The fading in/fading out function avoids 
the delivery of audio data in a step-wise manner to the speaker output (which 
results in discontinuity on the user side) by smoothing the audio data on fading 
in and fading out. The floating point operation elimination function avoids 
floating point operation by using pre-calculated weight functions instead of 
performing actual floating point multiplication. The mixing adaption function 
is used to adapt an actual mix calculation for a source audio client to the 
available CPU resources. The mixing cut-off function provides better 
scalability by allowing a mixing cut-off of three, in which the three nearest 
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talking audio clients are selected for the actual mix. The stream audio fimction 
prepares stream audio for two purposes: (1) playing ambient background music, 
such as radio in cyber-space; and (2) using it as an audio source forwarded 
from another conference. An example of refining an audio mix using both the 
mixing adaption and mixing cut-off functions follows. 

After the mixer thread 310 nuxes the audio data, the mixed PCM audio 
data packet is sent over the network via UDP/IP 128 to the correspondmg audio 
clients (STBs 112). Whether the full active mix for each audio client (STBs 
112) is sent depends on the availability of CPU resources. If the CPU 
resources are busy, the active mix for any one audio client (STB 112) will be 
reduced. For example, if an audio client*s active mix is equivalent to: 

audioX - OA>^audiol + 03^audio2 + O.S^audioS + 
0J^audio4 + l.Qy^audioS + O.O^audioX, 

and adequate CPU resources were available, the entire active mix could be 
delivered to audioX. Alternatively, if CPU resources were busy, then the 
active mix would be reduced. The reduction might be to delete audiol and 
audio2 since they are only heard by audioX at 10% and 30% of the maximum 
volume, respectively. 

Figure 9A is a flow diagram 900 representing the functionality of the 
mixer thread 310. Row diagram 900 begins by receiviog audio data from the 
net listener thread 308 and from PSAs 108 generated in the application service 
106 or the ACS 104 in step 902. Control then passes to step 904. 

In step 904, audio mixing is performed for each source audio client m 
the conference to provide spatialized audio. Control then passes to step 906 
where delivery of the actual audio mix to the target audio clients (STBs 112) 
is performed. 
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Figure 9B is a flow diagram 910 representing the audio mixing process 
904. Flow diagram 910 begins by identifying the decay factor for each source 
audio client in step 912. Control then passes to step 914. 

In step 914, the distance between the target audio client and each source 
audio client is determmed. Usmg the distances determmed in step 914, a 
weighted value is extracted using the identified decay factors firom step 912 for 
each source audio client in step 916. Control then passes to step 918. 

In step 918, the weighted values determined in step 916 are entered mto 
a mix table for each source/target audio client pair. The actual mix values are 
calculated in step 920 for each target audio client. The resultant audio mix 
values are refined in step 922. 

ACS Shell 

The ACS shell 110, written in TCL (a tool command language developed 
by John Ousterhout of the University of California, Berkeley), is an IDL client 
that connects to the ACS 104 and provides an interactive interface to monitor 
and control the full range of ACS functions. The ACS shell 110 enables 
developers to quickly prototype an application and operators to monitor and 
control audio conferences. The ACS shell 110 provides an ACS shell 
command prompt for prompting a user to enter ACS shell commands. The 
ACS shell prompt identifies the shell, time, and history, for example, acsshell 
<10:09am>[l]%. ACS shell commands enable the user to examine the 
execution of and interact with audio conferences and audio clients (PSAs 108 
and STBs 112). 

ACS shell commands that provide audio conferencing fimctionaUty are 
divided into four distinct classes. Figure 10 is a diagram 1000 representing 
program access and internal interfaces to the ACS shell audio conferencing 
classes, as well as the interrelationship among classes. The classes include an 
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ACProxy class 1002, a PSA class 1004, and AudioConference and 
AudioConferenceService classes 1006. 

The easiest way to use the ACS 104 is through flie ACProxy class 1002. 
The ACProxy class provides a higher level of functionality than the lower level 
AudioConference and AudioConferenceService classes 1006. The ACProxy 
class 1002 provides most, but not all, of the functionality implemented by the 
lower level classes 1006. The ACProxy class 1002 also calculates soxmd 
relationships (mixed audio) automatically when audio clients change position. 

Methods 1100 contained in the ACProxy class 1002 are shown in Figure 
11, The ACProxy methods 1100 enable the creation of a proxy audio 
conference. There are fourteen (14) methods m the ACProxy class 1002. The 
methods include: 

(1) ACProxyO method 1102 

(2) - ACProxyO method 1104; 

(3) AddClientO method 1106; 

(4) AddPSAO method 1108; 

(5) AudiosO method 1110; 

(6) DemuteAudioO method 1112; 

(7) GetAudioLocationO method 1114; 

(8) GetConflnfoO method 1116; 

(9) MoveAudioO method 1118; 

(10) MuteAudioO method 1120; 

(11) RegisterOwnerO method 1122; 

(12) RegisterOwnerByNameO method 1124; 

(13) RemoveAudioO method 1126; and 

(14) UnregisterOwnerO method 1128. 

The ACProxyO 1102 and ACProxyO 1104 methods allow the opening 
and closing of a proxy audio conference, respectively. Methods AddClientO 
1106 and AddPSAO 1108 add STB 112 and PSA 108 audio clients to the proxy 
audio conference, respectively. Audio clients (PSAs 108 and STBs 112) are 
identified using client IDs. The method AudiosO 1110 lists the audio client ID 
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numbers of all audio clients (PSAs 108 and STBs 112) in the proxy audio 
conference. The audio of a PSA 108 is enabled and disabled using methods 
DemuteAudioO 1112 and MuteAudioQ 1120, respectively, 

A proxy audio conference registers ownership of the application to the 
ACS 104. The ACS 104 then pings the RAS 410 to see if the application 
continues to exist. To transfer conference ownership to another application, 
methods RegisterOwnerQ 1122 and RegisterOwnerByNameO 1124 are invoked. 
The RegisterOwnerQ method 1122 transfers ownership of the proxy audio 
conference using an object reference. The RegisterOwnerByNameO method 
1124 transfers ownership of the audio conference using the name of the 
application. The UnregisterOwnerQ method 1128 removes the previous 
ownership of the audio conference. 

The ACProxy class 1002 allows one to specify a TV screen location for 
an audio client (PSA 108 or set-top box 112) and calculates the changes in 
sound automatically when audio clients (PSAs 108 and STBs 112) move. This 
is accomplished by invoking the MoveAudioQ method 1118. The X, Y 
coordinate location of an audio client (PSA 108 or STB 112) is displayed when 
the GetAudioLocationO method 1114 is invoked. 

To remove audio clients (PSAs 108 or STBs 112), the RemoveAudioQ 
method 1126 is invoked. Audio conference information is displayed by 
invoking the GetConflnfoQ method 1116. 

An example of how to add audio clients to a proxy conference by 
invoking methods from the ACProxy class 1002 is shown in Figure 15. Figure 
15 is a flow diagram 1500 representing the addition of an audio client to a 
proxy audio conference. Flow diagram 1500 begins by adding an audio client 
to the proxy audio conference in step 1502. The audio client can be a STB 112 
or a PSA 108. If the audio client is a STB 112, the AddClientQ method 1106 
is invoked. If the audio client is a PSA 108, the AddPSAQ method 1108 is 
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invoked. The audio client ID and the decay factor (702, 704, 708, 712, or 716) 
must be specified in the parenthetical of the AddClientQ method 1106 or the 
AddPSAO method 1108. Control then passes to step 1504. 

In step 1504, the audio client is located onscreen by invoking the 
5 MoveAudioO method 1118. If the audio client is a STB 112, the character 

representnig the audio client is located onscreen. If the audio client is a PSA 
108, the sound source representation of the audio client is located onscreen. 
When locating an audio client onscreen, the audio client ID and the X, Y 
coordinates of the audio client must be specified in the parenthetical. The 
10 origin, (0,0), is the lower left comer of the TV screen. 

Referring back to Figure 10, the most complete method of using the 
ACS shell 110 is by accessmg the PSA class 1004, and the AudioConference 
and AudioConferenceService classes 1006 directly. PSAs 108 are initiated by 
accessing the PSA class 1004. As previously stated, PSAs 108 can be files or 
15 audio lines. 

The PointSourceAudio class methods 1200 are shown in Figure 12. 
These methods allow the user to instance or delete a point source as well as 
play, pause, stop, and resume play of a point source. The PointSourceAudio 
class 1004 contains six (6) methods (1202-1212). The six methods include: 

20 (1) PointSourceAudioO method 1202; 

(2) - PointSourceAudioO method 1204; 

(3) PlayO method 1206; 

(4) StopO method 1208; 

(5) PauseO method 1210; and 
25 (6) ResumeO method 1212. 

The PomtSourceAudioO method 1202 mstantiates a PSA object. To 
play the audio source, the PlayO method 1206 is invoked. To stop playing the 
audio source, flie StopO method 1208 is invoked. To resume playing the audio 
source, the ResumeO method 1212 is invoked. The PauseO method 1210 
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pauses the playing of the audio source. To clean up resources and close the 
device used by the PSA 108, the -PointSourceAudioQ method 1204 is 
invoked. 

Referring back to Figure 10, the audio conference methods provided by 
5 the AudioConferenceService and AudioConference classes 1006 are direct IDL 

calls to the ACS 104. Thus, the conference manager thread 302 handles these 
incoming method calls. The ACProxy class 1002 and the PomtSourceAudio 
class 1004 are wrappers to these IDL interfaces. 

The AudioConferenceService methods 1300 are shown in Figure 13, 
10 The AudioConferenceService class 1006 contains five (5) methods (1302 - 

1310). The five (5) methods include: 

(1) OpenConferenceO 1302; 

(2) CloseConferenceO 1304; 

(3) GetConferenceByTicketO 1306; 
15 (4) ListConferenceO 1308; and 

(5) HctAudioStatO 1310. 

The OpenConferenceO 1302 and CloseConferenceO 1304 methods 
create and close an audio conference. An audio conference is identified by a 

20 ticket number. To locate a conference name by ticket number, the 

GetConferenceByTicketO method 1306 is invoked. One can also obtain a 
listing of the online conferences by invoking the ListConferenceO method 1308. 
Information about STB 112 audio clients can be obtained by mvoking the 
HctAudioStatO method 1310. The data provided for STB 112 audio clients 

25 includes such information as: 

(1) the host IP address of the ACS 104; 

(2) the UDP port on the ACS server that handles the UDP/IP audio data 
packets; 

(3) the port on the set-top side that handles the UDP/DP audio data packets; 
30 (4) the ID of the set-top where this audioID is mapped to; 

(5) the number that identifies each audio conference; 

(6) the date and time that the audio client (112) was created; 
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(7) 


the most recent access of the client or ping; 




(8) 


the most recent time that the ACS 104 received audio data from this 






. audio client (112); and 




(9) 


the most recent time that the ACS 104 sent audio daU to this audio 


5 




cUent (112). 






The methods for the AudioConference class 1400 are shown in Figure 




14. 


There are sixteen methods in the AudioConference class 1006. They 




include: 




(1) 


NewAudioO 1402; 


in 


(2) 


DeleteAudioO 1404; 




(3) 


RegisterOwnerO 1406; 




(4) 


RegisterOwnerByNameO 1408; 




(5) 


UnregisterOwnerO 1410; 




(6) 


SetMixOneO 1412; 


m 


(7) 


SetMixO 1414; 




(8) 


GetMixOneO 1416; 


: ■ 


(9) 


GetMixO 1418; 


'41 


(10) 


SetTimeOutO 1420; 




(11) 


GetTimeOutO 1422; 


r 20 


(12) 


AudioIdsO 1424; 




(13) 


AudioStatO 1426; 




(14) 


ConfStatO 1428; 




(15) 


HctAudioStatO 1430; and 




(16) 


PingO 1432. 


M 25 




The mediods in the AudioConferenceService and AudioConference class 



1300 and 1400, respectively, provide greater control over the audio conference 
than the ACProxy class methods 1100* As one can see from the lists of 
methods, one can open and close audio conferences, create and delete audio 
clients, set an automatic timeout, and set the volume levels between clients. 
30 The RegisterOwnerO method 1406, the RegisterOwnerByNameO 

method 1408, and the UnregisterOwnerO method 1410 are similar to the 
ACProxy methods 1122, 1124, and 1128. For descriptions of these methods 
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(1406, 1408, and 1410) refer to the discussion of the ACProxy methods (1122, 
1124, and 1128) given above. 

Methods NewAudioQ 1402 and DeleteAudioQ 1404 create an audio 
client for a given STB 112 and delete an audio client, respectively. The 
5 methods SetMixOneQ 1412 and GetMixOneQ 1416 set the volume and return 

the volume setting of the conversations between two audio clients (PSAs 108 
and STBs 112), respectively. The SetMixQ method 1414 and the GetMixQ 
method 1418 set the volimie and return the volume setting between many audio 
client pairs. The SetTimeOutO method 1420 sets a nxmiber of seconds of 
10 inactivity, after which the conference is closed automatically. A conference can 

I'S be set to never close, if desired. The GetTimeOutQ method 1422 finds the 

J length in seconds of the timeout. The PingO method 1432 pings a given audio 

\| conference and returns an exception if the conference does not exist. 

The remaining methods (1424-1430) all deal with providing information 
15 about the conference. The AudioIdsO method 1424 returns a list of audio IDs 

i3 in a conference. The AudioStatQ method 1426 returns information about an 

audio client, given the audio ID. The ConfStatQ method 1428 returns 
jB information about an audio conference, and the HctAudioStatQ method 1430 

returns information about an audio client, given the STB ID. 
20 Figure 16 is an exemplary flow diagram 1600 representing an audio 

conference in a service application using the lower level methods of the 
AudioConferenceService and AudioConference class 1300 and 1400. The 
procedural steps in flow diagram 1600 are presented as a guide. Other 
procedural step distributions for a service application using these lower level 
25 methods are within the scope and spirit of the present invention. 

Flow diagram 1600 begins in step 1602. In step 1602 an audio 
conference in an application is opened by invoking the OpenConferenceQ 
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method 1302. The ticket number identifying the audio conference or audio 
conference ID is returned. Control then passes to step 1604. 

In step 1604, a set-top box audio client 112 is added to the conference 
by invoking the NewAudioQ method 1402. The NewAudioQ method 1402 
5 returns an audio ID representative of the set-top box 112. Additional audio 

clients can be created by invoking the NewAudioQ method 1402 or the 
PointSourceAudioO method 1202. Control then passes to step 1606. 

In step 1606, the volume between audio clients is set by mvoking the 
SetMixOneO method 1412. The audio ID of both the receiver and the sender, 
10 as well as the mix (i.e., the weighted factor) must be included in the 

5 parenthetical of the SetMixOneQ method 1412, If one needed to set up the 

relative volumes between many viewers, the SetMixO method 1414 would be 
M invoked. The SetMixO method 1414 combines multiple IDL calls into one call. 

:H Control then passes to step 1608. 

15 In step 1608, the audio client is removed by invoking the DeleteAudioQ 

□ method 1404. To delete an audio client, the audio ID of the audio client to be 

deleted must be specified in the parenthetical of the DeleteAudioQ method 
2 1404. Control then passes to step 1610, where the audio conference in the 

application is closed by invoking the CloseConferenceQ method 1304. 

20 Conclusion 

While various embodiments of the present invention have been 
described above, it should be understood that they have been presented by way 
of exan5)le only, and not limitation. Thus, the breadth and scope of the present 
invention should not be limited by any of the above-described exemplary 

25 embodiments, but should be defined only in accordance with the following 

claims and their equivalents. 
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What Is Claimed Is: 

1 1 . An audio conference server (ACS) for enabling an application program 

2 to provide multi-poiat, weight controllable audio conferencing, comprising: 

3 means for managing at least one audio conference, said at least one 

4 audio conference comprising a plurality of audio clients; 

5 means for receiving audio data from said plurality of audio clients; 

6 means for mixing said audio data to provide spatialized audio to said 

7 plurality of audio clients in said at least one audio conference, wherein said 

8 mixing means results in mixed audio data; and 

9 means for delivering said mixed audio data to said plurality of audio 
10 clients in said at least one audio conference. 

1 2. The ACS of claim 1, wherein said mixing means includes means for 

2 providing distance-based attenuation according to sound decay characteristics, 

1 3 . The ACS of claim 1 , further comprising means for checking the status 

2 of a registered owner of said at least one audio conference to determine whether 

3 said at least one audio conference still exists. 

1 4. The ACS of claim 3, wherein said checking means includes a resource 

2 audit service, said resource audit service operable when said at least one audio 

3 conference is generated by a first application and is being used by a second 

4 application. 

1 5 . The ACS of claim 1 , wherein said plurality of audio clients includes set- 

2 top box (STB) audio clients and point source audio (PSA) audio clients, 
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1 6. The ACS of claim 1, wherein said managing means comprises an ACS 

2 shell to allow a user to interactively interface with said ACS, said ACS shell 

3 including: 

4 means for providing program access to high level methods for creating 

5 and managing a proxy audio conference; 

6 means for providing program access to methods for creating and 

7 managing a plurality of PSA audio clients; and 

8 means for providing program access to low level methods for creating 

9 and managing said at least one audio conference. 

1 7. The ACS of claim 2, wherem said means for providing distance-based 

2 attenuation according to soimd decay characteristics comprises: 

3 means for identifying a decay factor from one of a plurality of pre- 

4 defined decay factors and a customized decay factor for each of said plurality 

5 of audio clients, said plurality of pre-defined decay factors including 

6 an audio big decay factor, 

7 an audio small decay factor, 

8 an audio medium decay factor, and 

9 a constant decay factor; 

10 means for determining distances between a target audio client and a 

11 plurality of source audio clients; 

12 means for determming a plurality of weighted values for each of said 

13 source audio clients based on said identified decay factor and said distance 

14 between each of said source audio clients and said target audio client, wherein 

15 each of said weighted values corresponds to a source/target audio client pair; 

16 means for generating a mix table for each of said source/target audio 

17 client paks; 

18 means for calculating an actual mix for said target audio clients; and 
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19 means for refining said actual mix for said target audio clients. 

1 8. The ACS of claim 7, wherein said refining means comprises: 

2 a gain control function to avoid transmittmg excess energy audio data; 

3 a fade in/fade out fimction to avoid the delivery of said audio data in a 

4 step-wise manner to a speaker output; 

5 a floatmg point operation elimination function to avoid the performance 

6 of floating point multiplication; 

7 a mixing adaption function to adapt the actual mix calculation for said 

8 target audio client to available CPU resources; 

9 a mixing cut-off function to select the nearest talking audio clients for 

10 the actual mix; and 

11 a stream audio function to prepare stream audio for playing ambient 

12 background music or using an audio source forwarded from another 

13 conference. 

1 9. A method for enabling an audio conference server to provide an 

2 application program with multi-point, weight controllable audio conferencing, 

3 comprising the steps of: 

4 (1) managing at least one audio conference, said at least one audio 

5 conference comprising a plurality of audio clients; 

6 (2) receiving audio data from said plurality of audio clients; 

7 (3) mixing said audio data to provide spatialized audio to said 

8 plurality of audio clients in said at least one audio conference, wherein 

9 said mixing means results ui mixed audio data; and 

10 (4) delivering said mixed audio data to said plurality of audio clients 

11 in said at least one audio conference. 
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1 10. The method of claim 9, wherein said mixing step mcludes providing 

2 distance-based attenuation according to sound decay characteristics. 

1 IL The method of claim 9, further comprismg the step of checking the 

2 status of a registered owner of said at least one audio conference to determine 

3 whether said at least one audio conference still exists. 

1 12. The method of claim 1 1 , wherein said checking step mcludes a resource 

2 audit service, said resource audit service operable when said at least one audio 

3 conference is generated by a first application and is being used by a second 

4 application. 

1 13. The method of claim 9, wherem said plurality of audio clients includes 

2 set-top box (STB) audio clients and point source audio (PSA) audio clients. 

1 14 . The method of claim 9, wherein step (1) comprises the step of providmg 

2 program access to high level methods for creating and managing a proxy audio 

3 conference using an ACS shell. 

1 15 . The method of claim 9, wherem step (1) comprises the step of providing 

2 program access to methods for creating and managing said point source audio 

3 using an ACS shell. 

1 16. The method of claim 9, wherein step (1) comprises the step of providing 

2 program access to low level methods for creating and managing said at least 

3 one audio conference using an ACS shell. 
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1 17. The method of claim 10, wherein said step for providing distance-based 

2 attenuation according to sound decay characteristics comprises the steps of: 

3 identifymg a decay factor from one of a plurality of pre-defined decay 

4 factors and a customized decay factor for each of said plurality of audio clients, 

5 said plurality of pre-defined decay factors including 

6 an audio big decay factor, 

7 an audio small decay factor, 

8 an audio medium decay factor, and 

9 a constant decay factor; 

10 determining distances between a target audio client and a plurality of 

1 1 source audio clients ; 

12 determining a plurality of weighted values for each of said source audio 

13 clients based on said identified decay factor and said distance between each of 

14 said source audio client and said target audio client, wherein each of said 

15 weighted values corresponds to a source/target audio client pair; 

16 generating a mix table for each of said source/target audio client pairs; 

17 calculating an actual mix for said target audio clients using said mix 

18 table; and 

19 refining said actual mix for said target audio clients, wherein said 

20 refining step is used to avoid transmitting excess energy audio data, avoid the 

21 deUvery of said audio data in a step-wise manner to a speaker output, avoid the 

22 performance of floating point multiplication, adapt the actual mix calculation 

23 for said target audio client to available CPU resources, select the nearest talking 

24 audio clients for the actual mix, and prepare stream audio for playing ambient 

25 background music or using an audio source forwarded from another 

26 conference. 
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1 18, A computer program product comprising a computer useable medium 

2 having computer program logic recorded thereon for enabling an audio 

3 conference server (ACS) to provide an application program with multi-point, 

4 weight controllable audio conferencing, said computer program logic 

5 comprising: 

6 means for enablmg the computer to manage at least one audio 

7 conference, said at least one audio conference comprising a plurality of audio 

8 clients; 

9 means for enabling the computer to receive audio data from said 
10 plurality of audio clients; 

^5 11 means for enabling the computer to mix said audio data to provide 

P 12 spatialized audio to said plurality of audio clients m said at least one audio 

%i 13 conferences, wherein said mixing means results in mixed audio data; and 

14 means for enabling the computer to deliver said mixed audio data to said 

iji 15 plurality of audio clients in said at least one audio conference. 

^ 1 19. The computer program product of claim 18, wherein said means for 

i§ 2 enabling the computer to mix said audio data to provide spatialized audio to 

3 said plurality of audio clients in said at least one audio conference includes 

4 means for enabling the computer to provide distance-based attenuation 

5 according to sound decay characteristics, 

1 20. The computer program product of claim 18, further comprising means 

2 for enabling die computer to check the status of a registered owner of said at 

3 least one audio conference to determine whether said at least one audio 

4 conference still exists. 
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1 2L The computer program product of claim 20, wherein said means for 

2 enabling the computer to check the status of a registered owner of said at least 

3 one audio conference includes a resource audit service, said resource audit 

4 service operable when said at least one audio conference is generated by a first 

5 application is being used by a second application. 

1 22. The computer program product of claim 18, wherein said plurality of 

2 audio clients mcludes set-top box (STB) audio clients and point source audio 

3 (PSA) audio clients. 

1 23. The computer program product of claim 18, wherein said means for 

2 enabling the computer to manage at least one audio conference comprises 

3 means for enabling the computer to provide an ACS shell to allow a user to 

4 interactively interface with said ACS, said ACS shell including: 

5 means for enabling the computer to provide program access to high 

6 level methods for creating and managing a proxy audio conference; 

7 means for enabling the computer to provide program access to methods 

8 for creating and managing a plurality of point source audio (PSA) audio clients; 

9 and 

10 means for enabling the computer to provide program access to low level 

11 methods for creating and managing said at least one audio conference. 

1 24, The computer program product of claim 19, wherein said means for 

2 enabling the computer to provide distance-based attenuation according to sound 

3 decay characteristics comprises: 

4 means for enabling the computer to identify a decay factor from one of 

5 a plurality of pre-defined decay factors and a customized decay factor for each 



SGI Ref: 15^99.00 
SKGF Ref: 1452.2270000 



-35- 



6 of said plurality of audio clients, said plurality of pre-defined decay factors 

7 including 

8 an audio big decay factor, 

9 an audio small decay factor, 

10 an audio medium decay factor, and 

11 a constant decay factor; 

12 means for enabling the computer to determine distances between a target 

13 audio client and a plurality of source audio clients; 

14 means for enabling the computer to determine a plurality of weighted 

15 values for each of said source audio clients based on said identified decay factor 

16 and said distance between said source audio client and said target audio client, 

17 wherein each of said weighted values corresponds to a source/target audio 

18 client pair; 

19 means for enabling the computer to generate a mix table for each of said 

20 source/target audio client pairs; 

21 means for enabling the computer to calculate an actual mix for said 

22 source audio clients; and 

23 means for enabling the computer to refine said actual mix for said 

24 source audio clients. 

1 25. The computer program product of claim 24, wherein said means for 

2 enabling the computer to refine said actual mix for said source audio clients 

3 comprises: 

4 means for enabling the computer to provide a gain control function to 

5 avoid transmitting excess energy audio data; 

6 means for enabling the computer to provide a fade in/fade out function 

7 to avoid the delivery of said audio data in a step-wise manner to a speaker 

8 output; 
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9 means for enabling the computer to provide a floating point operation 

10 elimination function to avoid the performance of floating point multiplication; 

11 means for enabling the computer to provide a mixing adaption function 

12 to adapt the actual mix calculation for said target audio client to available CPU 

13 resources; 

14 means for enabling the computer to provide a mixing cut-off function 

15 to select the nearest talldng audio clients for the actual mix; and 

16 means for enablmg the computer to provide a stream audio function to 

17 prepare stream audio for playing ambient background music or using an audio 

18 source forwarded from another conference. 
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Spatialized Audio in a Three-Dimensional, 
Computer-Based Scene 

Abstract 

A system and method for enabling an audio conference server (ACS) to 
provide an application program with multi-point weight controllable audio 
conferencing. The ACS manages a plurality of audio conferences, receives 
audio data from a plurality of audio clients, mixes the audio data to provide 
distance-based attenuation according to decay characteristics for each sound, 
and delivers the mixed audio data to a plurality of audio clients. Audio clients 
include set-top box (STB) audio clients and point source audio (PSA) audio 
clients. The ACS mixes the audio data by identifying a decay factor. Pre- 
defined decay factors include an audio big decay factor, an audio small decay 
factor, an audio mediiun decay factor, and a constant decay factor. One can 
also develop a customized decay factor. A weighted value for a source audio 
client based on the identified decay factor and the distance between the source 
audio client and a target audio client is determined. A mix table is generated 
using the weighted values for each source/target audio client pair. Then an 
actual mix value for each target audio client is calculated using the mix table. 
The present invention also includes means for refining the actual mix value. 
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